“If you can’t measure it, you can’t manage it” is often miss-attributed to Deming (1993Deming, W Edwards. 1993. “The New Economics for Industry, Government, Education.” MIT Press.), who was actually trying to make the opposite point (Berenson 2016Berenson, Robert A. 2016. “If You Can’t Measure Performance, Can You Improve It?” JAMA 315 (7): 645–46.). Regardless, service level indicators are an important part of researching, managing and seeking to improve transit operations (Fielding 1987Fielding, Gordon J. 1987. Managing public transit strategically: a comprehensive approach to strengthening service and monitoring performance. 1st ed. Jossey-Bass Public Administration Series. San Francisco: Jossey-Bass Publishers.; Ryus et al. 2003Ryus, Paul, M Connor, S Corbett, A Rodenstein, L Wargelin, L Ferreira, Y Nakanishi, and K Blume. 2003. “TCRP Report 88: A Guidebook for Developing a Transit Performance-Measurement System.” Transit Cooperative Research Program.). A wide range of indicators already exist. Examples include: those in the Transit Capacity and Quality of Service Manual (TCQSM)(Kittleson & Associates et al. 2013Kittleson & Associates, Parsons Brinckerhoff, KFH Group, Texas A&M Transportation Institute, and ARUP. 2013. Transit Capacity and Quality of Service Manual, Third Edition. Third Edition, TCRP Report 165. Washington DC: Transportation Research Board; Transportation Research Board. http://www.trb.org/Main/Blurbs/169437.aspx.), the Transit Score metric (Walk Score 2023Walk Score. 2023. “Transit Score Methodology.” https://www.walkscore.com/transit-score-methodology.shtml.) and many more.
Practitioners, researchers and advocates seeking to use such metrics may face two inter-related challenges: (1) there is the problem of calculating the metrics themselves for a specific location and service pattern; and (2) is the challenge of explaining the metrics, their meaning and importance to those who are not specialists in transit, such as to politicians or the general public1 Of the examples above, the metrics in the TCQSM appear: difficult to calculate in practice, and difficult to explain because there are a multitude of indicators (although there is an entire guidebook explaining them, which might help somewhat); while, in contrast, Transit Scores can be obtained simply by typing an address into a website, but cannot be calculated independently and lack a detailed (and open source) description of the methodolgy / algorithm.. However, a relatively simple transit Supply Index (SI) has been previous developed (Currie and Senbergs 2007Currie, Graham, and Zed Senbergs. 2007. “Identifying spatial gaps in public transport provision for socially disadvantaged Australians: the Melbourne needs-gap study.” Australasian Transport Research Forum.), This reduces service levels to a single score (like the Transit Score), but is open and can be independently calculated by anyone. It is obtained by calculating the number of transit arrivals at stops within an area of interest, with an adjustment made to account for the typical walk-access catchment for each stop. Hence, higher SI scores indicate areas with higher frequency and/or better coverage.
Unfortunately, the SI does not appear to have been widely used, perhaps in part because at the time it was first published timetable data was not typically publicly available in a standardized and machine-readable format. The scores reported in Currie and Senbergs (2007) has been calculated directly from a database of services provided by the transit authority in Melbourne, Australia. However, this database appears to have been in a format specific to Victoria and the agency itself2 Public Transport Victoria (PTV), and so if SI scores for a different location were desired3 Or perhaps even for Melbourne again, but at a different time given that the database format has likely changed and the Currie and Senbergs (2007) analysis algorithms are not readily available. considerable work would be required to wrangle the data and calculate scores.
However,
since the widespread adoption
of the General Transit Feed Specification (GTFS),
timetable data
and tools for processing it
are now much more widely available
than they were in 2007.
More than 10,000 agencies
are now providing GTFS feeds4 There are two forms:
GTFS-static consisting of the timetable data (the scheduled services);
and GTFS-realtime, which includes vehicle arrivals and departure times based on real-world position data.
This paper and project uses only the GTFS-static (timetable) format.(MobilityData undatedMobilityData. undated. General Transit Feed Specification (GTFS). {https://gtfs.org/}.),
suggesting that the development of code
to calculate SI scores directly from GTFS
might allow it to be used more widely
in research and practice.
Previous work undertaken
by Monash University’s Public Transport Research Group (PRTG),
available on Github (Reynolds 2023Reynolds, James. 2023. “Transit_supply_index_GTFS.” https://github.com/James-Reynolds/Transit_Supply_Index_GTFS.),
developed R functions to calculate SI scores
from the Victorian PTV GTFS feed.
However,
the code was inefficient5
To calculate SI scores for the Victorian GTFS feed
would take the original code base approximately as long as it the service delivery itself took
i.e. (SI scores fora whole week of transit service would take a whole week to compute!).
and untidy,
and only allowed scores to be calculated for a whole day of transit service.
More recently, PTRG has commenced a project which requires SI scores by hour of the day. Unfortunately, the current code does not have this capability, and major changes will be needed to calculate stop arrivals on an hourly, rather than daily, basis. However, this also provides an opportunity to revise and improve the code so that it is more readily accessible, usable and understandable for others, and quicker.
This document reports the development of code to calculate SI scores from GTFS datasets as an R package (Reynolds 2024Reynolds, James. 2024. “Gtfssupplyindex.” https://github.com/James-Reynolds/gtfssupplyindex.). Structuring the code as an R package6 Rather than as scripts within a R Markdown file, as for the original Transit_Supply_Index_GTFS efforts provides various advantages as far as adhering to standardised conventions, accessibility and sharing, testing and so on (Wickham and Bryan 2023Wickham, Hadley, and Jennifer Bryan. 2023. R Packages. " O’Reilly Media, Inc.". https://r-pkgs.org/.). More broadly, the motivation for this research is to better understand how GTFS data might be used to produce benchmarking metrics that can be calculated using open-source code. Such metrics might then be able to be used to assess proposed network changes or other analyses without the need for specialist software or bespoke calculation7 More formally, the (null) hypothesis tested in this research is that the Supply Index cannot be calculated directly from a GTFS feed.. A related objective is to increase the availability of metrics that are relatively easy to understand and use when making decisions about or advocating for changes to existing services, including for those who may not be technical specialists in transit planning and scheduling.
The rest of this document is structured as follows: the next section discusses the research context of transit metrics and the the Supply Index. In the third section the methodology adopted for the code development is outlined, including discussion of the case studies (GTFS feeds) used to test and verify the code output. In the fourth section results are presented, including SI scores for SA1s across Greater Melbourne on an hour-by-hour basis. Results are then discussed, followed by a brief conclusion that includes the identification of directions for future research.
Even a brief search shows that there is a very large number of metrics available for benchmarking transit services. Examples include: (1) those in the Transit Cooperative Research Program (TCRP) Report 88, which is an extensive guidebook on developing a performance-measurement system (Ryus et al. 2003Ryus, Paul, M Connor, S Corbett, A Rodenstein, L Wargelin, L Ferreira, Y Nakanishi, and K Blume. 2003. “TCRP Report 88: A Guidebook for Developing a Transit Performance-Measurement System.” Transit Cooperative Research Program.); (2) online databases provided by the Florida Transit Information System (FTIS) (Florida Transit Information System 2018Florida Transit Information System. 2018. “Urban Integrated National Transit Database.” http://www.ftis.org/urban_intd.aspx.) and International Association of Public Transport (UITP) (2015International Association of Public Transport (UITP). 2015. “Mobility in Cities Database 2015.” Brussels, Belgium. uitp.org/publications/mobility-in-cities-database/.); (3) those used in the extensive annual benchmarking programme undertaken yearly by the Transport Strategy Centre, which includes over 100 transit providers around the world (Imperial College London undatedImperial College London. undated. “Transport Strategy Centre (TSC); Applied Research.” undated. https://www.imperial.ac.uk/transport-engineering/transport-strategy-centre/applied-research/.); and (4) a recently developed methodology to calculate ‘blank spots’ within an area, being those places beyond 400/800 metre walking distances to/from bus and tram stops/train stations (Alamri et al. 2023Alamri, Sultan, Kiki Adhinugraha, Nasser Allheeib, and David Taniar. 2023. “GIS Analysis of Adequate Accessibility to Public Transportation in Metropolitan Areas.” ISPRS International Journal of Geo-Information 12 (5): 180.).
The Fielding Triangle (Fielding 1987Fielding, Gordon J. 1987. Managing public transit strategically: a comprehensive approach to strengthening service and monitoring performance. 1st ed. Jossey-Bass Public Administration Series. San Francisco: Jossey-Bass Publishers.) provides a framework for understanding how such metrics combine service inputs, service outputs and service consumption to describe cost efficiency, cost effectiveness or service effectiveness. At a larger scale, Litman (2003Litman, Todd. 2003. “Measuring Transportation: Traffic, Mobility and Accessibility.” 10. Vol. 73. Institute of Transportation Engineers. ITE Journal. Washington, D.C.: Institute of Transportation Engineers.) and Litman (2016Litman, Todd. 2016. “When Are Bus Lanes Warranted? Considering Economic Efficiency, Social Equity and Strategic Planning Goals.” Victoria Transport Policy Institute. http://www.vtpi.org/blw.pdf.) discuss some of the traffic, mobility, accessibility, social equity, strategic planning and other rational decision-making frames that might underlie such transit metrics, while Reynolds et al. (2017Reynolds, James, Graham Currie, Geoff Rose, and Alistair Cumming. 2017. “Moving Beyond Techno-Rationalism: New Models of Transit Priority Implementation.” In Australasian Transport Research Forum 2017. Auckland, New Zealand.) extends this into models of how institutionalism, incrementalism and other public policy models might apply to decision-making processes. Further examples include: (1) Guzman, Oviedo, and Rivera (2017Guzman, Luis A., Daniel Oviedo, and Carlos Rivera. 2017. “Assessing Equity in Transport Accessibility to Work and Study: The Bogotá Region.” Journal of Transport Geography 58: 236–46.), who develop a measure of accessibility in the context of policy development and social equity for Latin American Bus Rapid Transit (BRT) based networks; and (2) the street space allocation metrics based around 10 ethical principles introduced by Creutzig et al. (2020Creutzig, Felix, Aneeque Javaid, Zakia Soomauroo, Steffen Lohrey, Nikola Milojevic-Dupont, Anjali Ramakrishnan, Mahendra Sethi, et al. 2020. “Fair Street Space Allocation: Ethical Principles and Empirical Insights.” Transport Reviews 40 (6): 711–33. https://doi.org/10.1080/01441647.2020.1762795.).
However, many of these metrics appear difficult to calculate, complex to explain or understand, and likely not well suited to communication with those who are not transit planners or engineers, or otherwise technical specialists. Where pre-calculated metrics are immediately available it may not be possible for practitioners, researchers or advocates to independently generate metrics for proposed system changes or to even know exactly how scores for the existing services levels are calculated. For example, Transit Scores for locations with a published GTFS feed are readily available on a website, eliminating the need for any calculations. The meaning of these Transit Scores appears easy to explain, as the highest possible score of 100 represents what might be experienced in the centre of New York(Walk Score 2023Walk Score. 2023. “Transit Score Methodology.” https://www.walkscore.com/transit-score-methodology.shtml.)]. However, the Transit Score algorithm is patented and effectively a black box. It is not possible to calculate scores independently or understand how the metric might change with alteration to the transit system or services, or the surrounding environment. Transit Score, therefore, fails the first of the aforementioned challenges, as practitioners, researchers and advocates can only use those scores provided by Walk Score (2023Walk Score. 2023. “Transit Score Methodology.” https://www.walkscore.com/transit-score-methodology.shtml.) While the metric is simple to explain, as the closer to 100, the better, because it is based on a patented algorithm it may not be easy to understand or explain the connection between real-world conditions and the Transit Score, or what might need to be done to improve the score and service levels. Nor does it appear to be possible for Transit Scores to be generated for proposed changes to networks.
Another example is the TCQSM, which specifies Levels of Service (LOS) between A and F across a range of factors8 Including service span, frequency, speed, the proportion of the population serviced, competitiveness of travel times to car-based travel, and many more.. This scoring scheme appears relatively simple to explain9 A is good and F is bad. Also this scoring system matches the A to F LOS scoring used in many traffic capacity analysis software and manuals., and the detail within Kittleson & Associates et al. (2013Kittleson & Associates, Parsons Brinckerhoff, KFH Group, Texas A&M Transportation Institute, and ARUP. 2013. Transit Capacity and Quality of Service Manual, Third Edition. Third Edition, TCRP Report 165. Washington DC: Transportation Research Board; Transportation Research Board. http://www.trb.org/Main/Blurbs/169437.aspx.) provides a resource for anyone wanting to better understand what the scores mean. However, calculation of many of TCQSM metrics may need specialised software and datasets10 For example, the Service Coverage Area metric in the TCQSM (pp. 5-8 to 5-21) may require GIS or other analysis, on top of accurate data about population densities, stop locations and service schedules. and it might be challenging to explain the detail of these measures or how to improve them to non-technical decision-makers, stakeholders or others involved in transit management or advocacy.
The introduction of the General Transit Feed Specification (GTFS) and widespread release of schedule data in this format, however, has helped towards making transit metrics more broadly available and useable. GTFS is an open, text-based format that was developed originally to allow transit information to be included in the Google Maps navigation platform (MobilityData undatedMobilityData. undated. General Transit Feed Specification (GTFS). {https://gtfs.org/}.). The data structure is shown in the below figure.
GTFS entity relationship diagram. Source: adapted by author from Alamri et al (2023) and the GTFS Schedule Reference (16/11/2023 revision).
In the Entity Relationship Diargram (ERD) shown above, each box represents a database table in the GTFS, with table rows indicating the variables (columns) included in each11 For example, each record in the ‘stops’ table includes a value for stop_id, stop_name, stop_lat and stop_lon.. Relationships between the tables are indicated by the connecting lines, and Primary Key (PK) and Foreign Key (FK) designations12 For example, stop_id also appears in the ‘stop_times’ table as a Primary Key and Foreign Key.. `Crow’s feet’ indicate the relationships between each table13 See https://i.stack.imgur.com/fxaAq.png for guide to the symbols. But, for example, the stops table is required, with the stop_id field providing a unique (primary) key for every stop. Within the stop_times table (which is also required) the stop_id field is a foreign key. Each unique stop_id can appear many times in the stop_times table, but must appear only once in the stops table. In the stop_times table each combination of trip_id, stop_id and arrival time must be unique (But, see note 2!) meaning that these fields represent a composite key..
GTFS now provides a mechanism for including individual transit systems in many online products and analyseses, including the Transit Score metric itself. Wong (2013Wong, James. 2013. “Leveraging the General Transit Feed Specification for Efficient Transit Analysis.” Transportation Research Record 1 (2338): 11–19. https://doi.org/10.3141/2338-02.) provides another example of what can be done with GTFS data, having developed code to calculate of some of the TCQSM metrics14 Daily average headways, route length and stop numbers for 50 transit operators.. While the Wong (2013Wong, James. 2013. “Leveraging the General Transit Feed Specification for Efficient Transit Analysis.” Transportation Research Record 1 (2338): 11–19. https://doi.org/10.3141/2338-02.) open-source code is readily available15 https://github.com/jcwong86/GTFS_Explore_Tool this is now 11 years old and does not appear to be currently maintained. Future research may involve reviewing this code and using it to analyse modern GTFS feeds. However, in this paper the aim is more modest, being to use GTFS data to calculate Currie and Senbergs’ (2007) SI.
\[\begin{equation} SI_{area, time} = \sum{\frac{Area_{Bn}}{Area_{area}}*SL_{n, time}} \end{equation}\]
The Supply Index (SI) equation is shown in the margin figure16 Minor adjustments have been made to generalise the equation, as Currie and Senbergs (2007) focus was the context of Melbourne’s Census Collection Districts (CCD) and calculations based on a week of transit service. CCDs predate the introduction of Statistical Areas 1, 2, 3, and 4 (SA1, SA2, SA3, SA4), and other geographical divisions currently used by the Australian Bureau of Statistics (ABS), which may be more familiar to readers., in which: (1) \(SI_{area, time}\) is the Supply Index for the area of interest and a given period of time; (2) \(Area_{Bn}\) is the buffer area for each stop (n) within the area of interest. In Currie and Senbergs (2007) this was based on a radius of 400 metres for bus and tram stops, and 800 metres for railway stations; (3) \(Area_{area}\) is the area of the area of interest; and (4) \(SL_{n,time}\) is the number of transit arrivals for each stop for a given time period.
An advantage of the SI is that it is a relatively simple number to calculate, understand and explain. It describes the number of transit arrivals at stops within an area of interest and time frame, multiplied by a factor accounting for the proportion of the area of interest that is within typical walking distance of each stop. Hence, more services, more stops and higher frequencies would all increase the SI score. The SI does not incorporate service span, speed or other elements of a transit service. These may be important to passenger experience, but might add considerable complexity. Simplicity is also helped by the way that the Index is additive, in that \(SI_{area, time}\) scores can be aggregated to calculate an overall score across multiple time periods or for a region encompassing multiple areas of interest.
R (R Core Team 2023R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.), a widely used and readily available statistical programming language, was adopted for code development. The package development setup and workflow described by Wickham and Bryan (2023Wickham, Hadley, and Jennifer Bryan. 2023. R Packages. " O’Reilly Media, Inc.". https://r-pkgs.org/.) was adopted in this study.
Various existing packages were relied upon including: the sf package (Pebesma 2023Pebesma, Edzer. 2023. Sf: Simple Features for r. https://r-spatial.github.io/sf/.) for geospatial analysis; the tidyverse (Wickham et al. 2019Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.); gtfstools (Daniel Herszenhut et al. 2022Herszenhut, Daniel, Rafael H. M. Pereira, Pedro R. Andrade, and Joao Bazzo. 2022. Gtfstools: General Transit Feed Specification (GTFS) Editing and Analysing Tools. https://ipeagit.github.io/gtfstools/.); and tidytransit (Poletti et al. 2023Poletti, Flavio, Daniel Herszenhut, Mark Padgham, Tom Buckley, and Danton Noriega-Goodwin. 2023. Tidytransit: Read, Validate, Analyze, and Map GTFS Feeds. https://github.com/r-transit/tidytransit.). Some code was adapted from the tidytransit, gtfstools and other package’s examples, vignettes and other documentation. Tidytransit is licensed under GPL-2 | GPL-3, while gtfstools is lincensed under the more permissive MIT license. For simplicity, the gtfssupplyindex package developed here is licensed under GPL-3 so as to match tidytransit.
Mornington Penninsula SA1 zones and location of Mornington Tourist Rail stops.
Code development was undertaken using a number of cases for testing purposes, being: the Mornington Penninsula Tourist Railway, the New York subway, and Melbourne.
The Morning Penninsula Tourist Railway runs on Sundays and Wednesdays between Moorooduc and Mornington, with an intermediate stop at Tanti Park17 https://transitfeeds.com/p/mornington-railway/806/latest/stops. A GTFS feed from 2018 was selected. ABS data was also used, primarily through the strayr and absmapsdata packages (Mackey et al. 2023Mackey, Will, Matt Johnson, David Diviny, Matt Cowgill, Bryce Roney, William Lai, and Benjamin Wee. 2023. “Strayr.” https://runapp-aus.github.io/strayr/.). The Mornington Peninsular SA3 zone and the SA1 zones contained within were adopted as the areas_of_interest.
Tidytransit includes a sample GTFS feed from New York’s MTA
(including the subway!),
and so this was used for code tests were appropriate.
Larger scale testing was performed using the Victorian GTFS feed, published by Public Transport Victoria (PTV), sourced via Transit Mobility Data, (2023Transit Mobility Data,. 2023. “PTV GTFS - OpenMobilityData.” 2023. https://transitfeeds.com/p/ptv/497.) for historical feeds. Again, ABS data was used for the areas_of_interest.
Developed code is available and documented on github (Reynolds 2024Reynolds, James. 2024. “Gtfssupplyindex.” https://github.com/James-Reynolds/gtfssupplyindex.). The structure and functions used to generate each table are shown in the below Entity Relationship Diagram (ERD).
Entity Relationship Diagram (ERD) showing the data structure and functions
The code takes input from three files:
It outputs the si_by_area_day_and_hour table (bottom right), which reports the SI score for each hour of the day across dates specified by the user. The calculation of the SI scores is described in the following, which shows examples based on the Mornington Peninsula Tourist Railway GTFS and ABS datasets.
GTFS data is first loaded, with the gtfs_by_route_type function splitting this into a list of tidygtfs objects, with one list element for each route_type present in the gtfs data. This is achieved using the filter_by_route_type function from the gtfstools package (Danile Herszenhut et al. undatedHerszenhut, Danile, Rafael H. M. Pereira, Pedro R. Andrade, and Joao Bazzo. undated. Gtfstools; Filter GTFS Object by Route Type (Transport Mode). https://ipeagit.github.io/gtfstools/reference/filter_by_route_type.html.).
#load the revised mornington GTFS data
list_gtfs = gtfssupplyindex:::gtfs_by_route_type(system.file(
"extdata/mornington180109",
"gtfs.zip",
package = "gtfssupplyindex",
mustWork = TRUE))
names(list_gtfs) %>% kable(caption = "tidygtfs objects within list_gtfs")
tidygtfs objects within list_gtfs
| x |
|---|
| rail |
names(list_gtfs[[1]]) %>% kable(caption = "tidygtfs structure of first element of list_gtfs")
tidygtfs structure of first element of list_gtfs
| x |
|---|
| agency |
| calendar |
| calendar_dates |
| fare_attributes |
| fare_rules |
| feed_info |
| routes |
| shapes |
| stop_times |
| stops |
| trips |
| . |
Geographical data about the areas of interest are first loaded by the load_areas_of_interest.R function into an sf object, containing that area_id key and associated geometry.
areas_of_interest <- load_areas_of_interest(areas_of_interest = sf::st_read(system.file(
"extdata",
"mornington_sa12021.geojson",
package = "gtfssupplyindex",
mustWork = TRUE)),
area_id_field = "sa1_code_2021")
## Reading layer `mornington_sa12021' from data source
## `/Users/jreynold/Documents/0001_project/gtfssupplyindex/inst/extdata/mornington_sa12021.geojson'
## using driver `GeoJSON'
## Simple feature collection with 392 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 144.6514 ymin: -38.49738 xmax: 145.2615 ymax: -38.16239
## Geodetic CRS: WGS 84
head(areas_of_interest) %>% kable(caption = "First 6 entries in areas_of_interest table")
First 6 entries in areas_of_interest table
| area_id | geometry |
|---|---|
| 21402137701 | MULTIPOLYGON (((144.9617 -3… |
| 21402137702 | MULTIPOLYGON (((144.9916 -3… |
| 21402137703 | MULTIPOLYGON (((144.9923 -3… |
| 21402137704 | MULTIPOLYGON (((144.9878 -3… |
| 21402137705 | MULTIPOLYGON (((145.0015 -3… |
| 21402137706 | MULTIPOLYGON (((145.0088 -3… |
The data about buffer zones, specifically the walking distance threshold assigned to each route_type (mode) is then loaded, again through a function (load_buffer_zone.R).
buffer_distance <- gtfssupplyindex:::load_buffer_zones()
head(buffer_distance) %>% kable(caption = "First six entries in buffer_distance definitions")
First six entries in buffer_distance definitions
| route_type | buffer_distance | short_name |
|---|---|---|
| 0 | 400 | lrt |
| 1 | 800 | subway |
| 2 | 800 | rail |
| 3 | 400 | bus |
| 4 | 800 | ferry |
| 5 | 400 | cable_tram |
The stops_in_walk_dist function then generates a list, again with each element representing a route_type (mode), of stops_in_or_near_areas tables. Each entry in the tables consists of a stop_id and area_id that are within the buffer_distance threshold of each other, and the corresponding area related terms in the SI calculation (\(Area_{Bn} / Area_{Area}\).
stops_in_or_near_areas <- gtfssupplyindex:::stops_in_walk_dist(
list_gtfs = list_gtfs,
areas_of_interest = areas_of_interest,
buffer_distance = buffer_distance,
EPSG_for_transform = 28355
)
head(stops_in_or_near_areas[[1]]) %>% kable(caption = "'Rail' element of the stops_in_or_near_areas list for the Mornington Pennisula datasets, first six entries")
‘Rail’ element of the stops_in_or_near_areas list for the Mornington Pennisula datasets, first six entries
| stop_id | area_id | area_terms |
|---|---|---|
| 1388695887 | 21402159101 | 0.7999912 |
| 1388695887 | 21402159102 | 0.0168220 |
| 1388695887 | 21402159105 | 0.6779951 |
| 1388695887 | 21402159107 | 0.6453927 |
| 1388695887 | 21402159111 | 0.2011127 |
| 1388695887 | 21402159113 | 0.1081424 |
The variables passed to the stop_in_or_near_areas function are list_gtfs (the list of tidygtfs objects), areas_of_interest (the output of the load_areas_of_interest functions), the buffer_distance table, and an EPSG_for_transform variable. This last variable is the Coordinate Reference System (CRS) value relevant to the geographic location. It is used to project the latitude and longitude values included in the GTFS data and area_of_interest data into metres, in this case relating to the GDA94 / MGA zone 55 relevant to Australia18 https://epsg.io/28355.
As shown in the above ERD, the stops_in_walk_dist function takes the list (by route_type) of tidygtfs objects, the areas_of_interest table and the buffer_distance table as inputs. It outputs the stops_in_or_near_areas list (by route_type) of tables, which includes the area-related terms of the SI calculation.
The first part of the stops_in_walk_dist function looks up the buffer_distance_length specific to each route_type in the list (by route_type) of tidygtfs objects. This value is appended to each tidygtfs object as an addition element (named buffer_distance).
# add buffer_distance_length to list (by route) of tidy_gtfs
list_gtfs <- mapply(function(x, nm){
append(x,
(buffer_distance =
buffer_distance[
which(
buffer_distance$short_name %in% nm)
, "buffer_distance"]))},
list_gtfs, names(list_gtfs), SIMPLIFY=FALSE)
names(list_gtfs[[1]]) %>% kable(caption = "Updated tidygtfs, with added element for the corresponding buffer_distance")
Updated tidygtfs, with added element for the corresponding buffer_distance
| x |
|---|
| agency |
| calendar |
| calendar_dates |
| fare_attributes |
| fare_rules |
| feed_info |
| routes |
| shapes |
| stop_times |
| stops |
| trips |
| . |
| buffer_distance |
list_gtfs[[1]]$buffer_distance %>% kable(caption = "buffer_distance element")
buffer_distance element
| x |
|---|
| 800 |
Next the areas_of interest are transformed from latitude and longitude into metres, and the area is calculated.
# transform areas_of_interest to CRS in metres
areas_of_interest <- sf::st_transform(areas_of_interest, EPSG_for_transform)
# calculate Area_area terms
areas_of_interest$area_area <- sf::st_area(areas_of_interest)
head(areas_of_interest) %>% kable(caption = "Areas of first six SA1 zones in Mornington Penninsula SA3")
Areas of first six SA1 zones in Mornington Penninsula SA3
| area_id | geometry | area_area |
|---|---|---|
| 21402137701 | MULTIPOLYGON (((321847.7 57… | 317503.8 [m^2] |
| 21402137702 | MULTIPOLYGON (((324433.7 57… | 514954.3 [m^2] |
| 21402137703 | MULTIPOLYGON (((324483.5 57… | 526268.0 [m^2] |
| 21402137704 | MULTIPOLYGON (((324086.5 57… | 451389.2 [m^2] |
| 21402137705 | MULTIPOLYGON (((325290.3 57… | 2229459.7 [m^2] |
| 21402137706 | MULTIPOLYGON (((325890.1 57… | 281346.4 [m^2] |
The stops_in_walking_distance function then applies a sub-function, stops_in_walk_dist_one_route, to the list_gtfs (list of tidygtfs objects). First, a list of stops is extracted from the tidygtfs, using the stops_as_sf function from Poletti et al. (2023Poletti, Flavio, Daniel Herszenhut, Mark Padgham, Tom Buckley, and Danton Noriega-Goodwin. 2023. Tidytransit: Read, Validate, Analyze, and Map GTFS Feeds. https://github.com/r-transit/tidytransit.).
#variables passed by lapply
gtfs_single_route_type = list_gtfs[[1]]
buffer_distance_length = 800
#get stops as sf from the gtfs, listed by route_type, using stops_as_sf_function (see below)
stops_as_sf <- tidytransit::stops_as_sf(gtfs_single_route_type$stops)
#drop everything except geometry and stop_id
stops_as_sf <- stops_as_sf %>% dplyr::select(stop_id, geometry)
# map the stops onto the areas_of_interest for the first element in the list
#map +
# geom_sf(data=list_of_stops_as_sf[[1]], size = 2)
head(stops_as_sf) %>% kable(caption = "Stops extracted from Mornington GTFS")
Stops extracted from Mornington GTFS
| stop_id | geometry |
|---|---|
| 1388695887 | POINT (145.0503 -38.22985) |
| 1452182324 | POINT (145.0669 -38.22371) |
| 650916735 | POINT (145.107 -38.21405) |
Circles are then drawn around each stop, with the radius equal to the buffer distance. Intersecting these circles with the areas_of_interest then outputs an sf showing the catchment of each stop within each area of interest
Catchment of each stop within each area of interest
# transform stops to CRS in metres
stops_as_sf <- stops_as_sf %>% sf::st_transform(crs = EPSG_for_transform)
#draw radius around stops of the buffer zone
circles_around_stops <- stops_as_sf %>% sf::st_buffer(dist = buffer_distance_length)
# Intersect the circles
stops_in_or_near_areas <- sf::st_intersection(areas_of_interest, circles_around_stops)
ggplot() +
geom_sf(data=stops_in_or_near_areas, aes(fill = area_id)) +
theme(legend.position = "none")
Finally, the \(area_{Bn}\) terms are calculated for each combination of stop_id and area_id,
# calculate the area_BN_term and drop geometry
stops_in_or_near_areas$area_Bn <- sf::st_area(stops_in_or_near_areas)
stops_in_or_near_areas <- stops_in_or_near_areas %>% sf::st_drop_geometry()
# join areas_of_interest areas (but not the area geometry)
stops_in_or_near_areas <- dplyr::left_join(stops_in_or_near_areas, areas_of_interest %>% sf::st_drop_geometry())
# calculate combined area terms
stops_in_or_near_areas$area_terms <- stops_in_or_near_areas$area_Bn / stops_in_or_near_areas$area_area
# drop units
stops_in_or_near_areas$area_terms <- as.numeric(stops_in_or_near_areas$area_terms)
head(stops_in_or_near_areas %>% dplyr::select(stop_id, area_id, area_terms)) %>%
kable(caption = "Output of stops_in_walk_dist_one_route for Mornington Peninsula, first six entries")
Output of stops_in_walk_dist_one_route for Mornington Peninsula, first six entries
| stop_id | area_id | area_terms |
|---|---|---|
| 1388695887 | 21402159101 | 0.7999912 |
| 1388695887 | 21402159102 | 0.0168220 |
| 1388695887 | 21402159105 | 0.6779951 |
| 1388695887 | 21402159107 | 0.6453927 |
| 1388695887 | 21402159111 | 0.2011127 |
| 1388695887 | 21402159113 | 0.1081424 |
As discussed above, the SI is additive, so the si_by_area_day_and_hour table can be calculated from a si_by_mode_and_time table containing individual SI scores by mode19 SI calculations must be on a per mode basis up until the final aggregation because individual modes may have different buffer distances. for individual periods of time.
The buffer_distance table draws data from Comma Separated Variables (CSV) file, based on the Extended GTFS Route Types definitions20 https://developers.google.com/transit/gtfs/reference/extended-route-types, with defaults of 800 metres and 400 metres for heavy rail and all other modes respectively.
The areas_of_interest table stores geometric information about the areas for which SI scores are desired, as simple features21 See https://cran.r-project.org/web/packages/sf/vignettes/sf1.html. The stops_in_or_near_areas and si_by_mode_and_time tables are used to store components of the SI during the calculations, which are described in the following.
CODE references (Poletti undatedPoletti, Flavio. undated. Tidytransit: Generate a Departure Timetable. https://r-transit.github.io/tidytransit/articles/timetable.html.)
In the previous section the various functions were outlined, with sample output generated for the Mornington Peninsula Tourist Railway feed and the ABS SA1 zones in the Mornington Peninsula. The following sections show outputs of the various functions for different input data, as follows:
Testing the code with the code with the same Mornington Peninsula Tourist Railway GTFS, but SA2 zones instead of SA1 provides the following results^[Here results are verbose, including the intermediate tables and plots so we can see what is happening step-by-step. Formatting is a mess, but: the first tibble (6x3) shows the buffer_distance table; then there are the names of the list elements, showing that the buffer_distance element has been added. This is shown as 800m on the next line. The areas_of_interest table is then shown (starting at “Simple feature collection…”), then the stop_id locations. After the charts the stops_in_or_near_areas table is shown.
## # A tibble: 6 × 3
## route_type buffer_distance short_name
## <int> <dbl> <chr>
## 1 0 400 lrt
## 2 1 800 subway
## 3 2 800 rail
## 4 3 400 bus
## 5 4 800 ferry
## 6 5 400 cable_tram
## [1] "agency" "calendar" "calendar_dates" "fare_attributes"
## [5] "fare_rules" "feed_info" "routes" "shapes"
## [9] "stop_times" "stops" "trips" "."
## [13] "buffer_distance"
## [1] 800
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 294637.9 ymin: 5736943 xmax: 345893.9 ymax: 5774458
## Projected CRS: GDA94 / MGA zone 55
## area_id geometry area_area
## 1 214021377 MULTIPOLYGON (((324630.6 57... 45054608 [m^2]
## 2 214021378 MULTIPOLYGON (((312893 5749... 287352536 [m^2]
## 3 214021379 MULTIPOLYGON (((343791.7 57... 108998448 [m^2]
## 4 214021381 MULTIPOLYGON (((333363.3 57... 23203114 [m^2]
## 5 214021382 MULTIPOLYGON (((327376.5 57... 30221324 [m^2]
## 6 214021383 MULTIPOLYGON (((313323.4 57... 67190425 [m^2]
## Simple feature collection with 3 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 329348.7 ymin: 5766885 xmax: 334273.1 ymax: 5768742
## Projected CRS: GDA94 / MGA zone 55
## # A tibble: 3 × 2
## stop_id geometry
## <chr> <POINT [m]>
## 1 1388695887 (329348.7 5766885)
## 2 1452182324 (330784.2 5767597)
## 3 650916735 (334273.1 5768742)
Catchment of each stop within each area of interest, Mornington Penninsula Railway and SA2 zones
Catchment of each stop within each area of interest, by stop, Mornington Penninsula Railway and SA2 zones
Catchment of each stop within each area of interest, by stop (x) and area of interest (y), Mornington Penninsula Railway and SA2 zones
## stop_id area_id area_terms
## 1 1388695887 214021591 1.070441e-01
## 2 1388695887 214021592 6.849647e-02
## 3 1452182324 214021381 6.723449e-05
## 4 1452182324 214021591 1.366432e-01
## 5 650916735 214021381 5.222900e-02
## 6 650916735 214021385 6.796988e-03
This appears to meet expectations, with fewer SA2 zones within the walking distance catchment of the stops. As well, the area terms indicate that only a small fraction of each of the SA2 zones are within the catchment areas of the stops.
Testing the code with the code with the same Mornington Peninsula Tourist Railway GTFS, but instead combining the areas_of_interest so that we are only looking at the Mornington Peninsular SA3 zone (in its entirety) gives22 Verbose set to false in this instance:
Catchment of each stop within Mornington Penninsula SA3 zone
|
Again, this appears to meet expectations, with all three stops falling within the same SA3 zone.
Checking the area terms calculation is relatively trivial for this case, as the ABS maps data comes with an area field
## Simple feature collection with 1 feature and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 144.6514 ymin: -38.49884 xmax: 145.2615 ymax: -38.16239
## Geodetic CRS: WGS 84
## sa3_code_2021 sa3_name_2021 sa4_code_2021 sa4_name_2021
## 1 21402 Mornington Peninsula 214 Mornington Peninsula
## gcc_code_2021 gcc_name_2021 state_code_2021 state_name_2021 areasqkm_2021
## 1 2GMEL Greater Melbourne 2 Victoria 724.1681
## cent_lat cent_long geometry
## 1 -38.34165 145.0355 MULTIPOLYGON (((145.2129 -3...
## [1] 0.002776454 NA
There appears to be some minor rounding errors, but this appears to related to the ABS reported square_km (724.1681) being slightly different from the m2 reported by the st_area function (724,272,059). Likely this is related to the projections used and way that the ABS got to their square_km value (unknown).
The north of the Melbourne CBD was selected for testing the code as it is may be familiar to readers, and has three different transit modes. The SA1 and SA2 zone boundaries are shown below.
SA1 zone boundaries within the Melbourne CBD - North SA2, Source: ABS
Trains stop at Melbourne Central and Flagstaff stations. Trams run along Elizabeth, Swanston, William, La Trobe and other streets. There are also bus services throughout the area, including along Queen Street.
## # A tibble: 6 × 3
## route_type buffer_distance short_name
## <int> <dbl> <chr>
## 1 0 400 lrt
## 2 1 800 subway
## 3 2 800 rail
## 4 3 400 bus
## 5 4 800 ferry
## 6 5 400 cable_tram
## [1] "agency" "calendar" "calendar_dates" "routes"
## [5] "shapes" "stop_times" "stops" "trips"
## [9] "buffer_distance"
## [1] 400
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 320024 ymin: 5813125 xmax: 320518.2 ymax: 5813647
## Projected CRS: GDA94 / MGA zone 55
## area_id geometry area_area
## 1 20604150401 MULTIPOLYGON (((320518.2 58... 2219.743 [m^2]
## 2 20604150402 MULTIPOLYGON (((320211.3 58... 2507.320 [m^2]
## 3 20604150403 MULTIPOLYGON (((320067.9 58... 11618.113 [m^2]
## 4 20604150404 MULTIPOLYGON (((320423.5 58... 1346.065 [m^2]
## 5 20604150405 MULTIPOLYGON (((320473.8 58... 9535.895 [m^2]
## 6 20604150406 MULTIPOLYGON (((320420.8 58... 5824.552 [m^2]
## Simple feature collection with 6 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 314938 ymin: 5807445 xmax: 327889 ymax: 5817658
## Projected CRS: GDA94 / MGA zone 55
## stop_id geometry
## 1 10311 POINT (326577 5807614)
## 2 10371 POINT (326301 5807651)
## 3 1083 POINT (314938 5817658)
## 4 11285 POINT (326069 5807686)
## 5 1185 POINT (327889 5807445)
## 6 12599 POINT (325817 5807721)
Catchment of each stop within each area of interest, Melbourne CBD - North SA1 zones, trams
Catchment of each stop within each area of interest, Melbourne CBD - North SA1 zones, trams, by stop, first six stops
## stop_id area_id area_terms
## 1 17591 20604150409 1.00000000
## 2 17591 20604150417 0.89041997
## 3 17591 20604150424 0.06524449
## 4 17591 20604150430 0.05187639
## 5 17593 20604150409 0.99026393
## 6 17593 20604150417 1.00000000
## Simple feature collection with 6 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 298792 ymin: 5751165 xmax: 344673 ymax: 5857015
## Projected CRS: GDA94 / MGA zone 55
## # A tibble: 6 × 2
## stop_id geometry
## <chr> <POINT [m]>
## 1 15351 (299316 5838456)
## 2 15353 (298792 5833122)
## 3 17204 (323490 5857015)
## 4 19827 (344673 5751165)
## 5 19828 (343101 5752035)
## 6 19829 (341813 5753352)
Catchment of each stop within each area of interest, Melbourne CBD - North SA1 zones, rail
Catchment of each stop within each area of interest, Melbourne CBD - North SA1 zones, rail, by stop, first six stops
## stop_id area_id area_terms
## 1 19841 20604150401 1
## 2 19841 20604150402 1
## 3 19841 20604150403 1
## 4 19841 20604150404 1
## 5 19841 20604150405 1
## 6 19841 20604150406 1
## Simple feature collection with 6 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 287976 ymin: 5820461 xmax: 325356 ymax: 5827395
## Projected CRS: GDA94 / MGA zone 55
## stop_id geometry
## 1 10001 POINT (304018 5822149)
## 2 10002 POINT (287976 5827395)
## 3 10009 POINT (304034 5820537)
## 4 1000 POINT (325356 5825537)
## 5 10010 POINT (304404 5820496)
## 6 10011 POINT (304703 5820461)
Catchment of each stop within each area of interest, Melbourne CBD - North SA1 zones, bus
Catchment of each stop within each area of interest, Melbourne CBD - North SA1 zones, bus, by stop, first six stops
## stop_id area_id area_terms
## 1 19249 20604150409 0.9645607
## 2 19249 20604150412 1.0000000
## 3 19249 20604150414 0.6407352
## 4 19249 20604150417 0.6741177
## 5 19249 20604150418 0.8122632
## 6 19249 20604150422 1.0000000
In general, the results appear to meet expectations. Looking first at the tram stop results, the first stop has a stop id of 17591. This stop is located at Russell and Bourke Street23 https://transitfeeds.com/p/ptv/497/latest/stops?q=17591. A 400 metres catchment overlaid on this stop is shown in the margin24 Note: Roughly drawn from the ABS maps.. This matches the code output shown above (and duplicated in the margin).
SA1 zone boundaries within the Melbourne CBD - North SA2, overlayed with 400m radius from Russell and Bourke Street. Source: ABS and author
Area mapping for stop 17591 (Russel and Bourke), duplicated from above verbose code output
The rail results appear to meet expectations, with the catchment of Melbourne Central Station (top, centre) covering all of the CBD - North SA2; Flagstaff (top, left) and Parliament (top, right) covering part of the SA2, while Flinders Street having two stop entries, one for Vline and one for Metro, which are just within 800m of a sliver of the SA2.